Figure 1:
Qualitative summary of color–name bidirectional modeling. (a) Color-to-Name (C2N) recommendation: given an
uncommon car color, our model retrieves plausible human-used names (e.g., brown, maroon) from a unified
color–name
embedding. (b) Name-to-Color (N2C) recommendation: given textual intents (e.g., light yellow green vs.
khaki) for a white
dress, the model returns a RGB swatches that match each description. Trained on large-scale crowdsourced
data with severe
sparsity and imbalance, our contrastive framework (negative sampling + multi-task losses) supports both
directions in a single
representation, yielding substantially higher C2N accuracy and lower N2C perceptual error than prior
methods.
Large-scale color datasets exhibit significant sparsity in name-color correspondences, substantially impeding the effectiveness of conventional methodologies. We propose a contrastive learning-based framework for color name generation and recommendation that addresses sparsity through negative sampling, supporting two core tasks: color-to-name recommendation and name-to-color generation. Our framework employs a multi-task contrastive learning architecture comprising three key components: (1) a pre-trained Transformer-based name encoder, (2) an RGB encoder, and (3) an RGB generator. The framework utilizes negative sampling to construct positive-negative pairs, contrasting RGB encoder outputs with positive and negative name embeddings. We adopt a multi-objective optimization strategy incorporating binary cross-entropy loss for neural collaborative filtering, and mean squared error loss for name-to-RGB mapping. Experimental results demonstrate substantial improvements over baseline methods, achieving 71.26% Top-10 accuracy in color-to-name recommendation and reducing CIELAB distance error to 26.61 in name-to-color generation.
Figure 2: Color distributions in the RGB unit cube for three color names: (a) green, (b) blue, and (c) teal. Each panel plots 1,000 unique RGB samples randomly drawn from the XKCD dataset [24].
Figure 3: Framework of our proposed approach.
Table 1: Ablation of framework components. "Enhanced RGB" uses a 16-dimensional input in place of 3-D RGB; "Neg. Sampling" denotes the negative sampling procedure; "NCF" is the neural collaborative filtering module (when disabled, similarity is the inner product between aligned embeddings); "MSE" is the regression loss in the RGB generator. "C2N" is color-to-name recommendation; "N2C (R/G)" are name-to-color recommendation/generation, reported as Δ𝐸 in CIELAB (lower is better).
Table 2: Quantitative results on the XKCD test set. C2N = color-to-name recommendation; N2C = name-to-color. For C2N we report Top-𝑘 accuracy (%), for N2C we report the perceptual color difference Δ𝐸 (CIELAB, lower is better). Generation time is per query.
Figure 4: Qualitative results of the proposed framework. (a) Color-to-Name (C2N) recommendation: each panel shows the query color on the left (ground-truth swatch with its label) and our top-𝑘 predicted names on the right; within the recommendation list, the bold entry is the ground-truth label, indicating a correct hit. (b) Name-to-Color (N2C) recommendation: for each query name (title), we display the top-𝑘 RGB swatches predicted by our model from left to right; recommended colors are mutually JND-separated in CIE L∗a∗b∗ (𝜏 ≈ 2.3).
|
| Paper (960k) |
This work is supported by the grants of the NSFC (No.62502523,
No.U2436209), the National Key R&D Program of China under Grant
2022ZD0160805, the Beijing Natural Science Foundation (L247027),the Fundamental Research Funds for the
Central Universities, and
the Research Funds of Renmin University of China.